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Some  Properties  of  Maximum  Likelihood  Strategy 
for  Re-Pairing  Broken  Random  Sample 

By  Prem  K.  Goel  &  T.  Ramalingam 
The  Ohio  State  University  and  Northern  Illinois  University. 


1.  Introduction.  An  important  tool  for  analyzing  economic  policies  is  the 
microanalytic  model.  Many  Federal  agencies  use  such  models  for  the  evaluation  of 
policy  proposals.  When  all  the  input-data  for  the  model  come  from  a  single  source, 
the  quality  of  the  model  depend  on.  among  others,  how  complete  the  information  is  on 
jointly  observed  variables.  Often  times,  the  input  for  the  model  consists  of  data  from 
more  than  one  Federal  Agency.  For  instance,  to  make-up  for  'gaps'  that  occur  in 
decennial  Census,  the  Bureau  of  the  Census  and  the  Internal  Revenue  Service 
provide  marginal  information  on  variables.  However,  joint  information  on  these 
variables  is  not  available  to  either  of  the  two  agencies.  In  such  cases,  Federal 
statisticians  use  file  merging  methodology  in  order  to  produce  comprehensive  data  on 
variables  of  interest.  A  review  of  the  origin,  progress  and  recent  developments  of  this 
methodology  is  given  in  Radner  et  al  (1980). 

An  unified  frame  work  for  all  such  models  for  the  file-merging  methodology  and 
statistical  properties  of  some  of  them  are  given  in  Ramalingam  (1985)  and  Goel  and 
Ramalingam  (1985).  One  useful  model  for  obtaining  matched  pairs,  introduced  by 
DeGroot,  Feder  and  Goel  (1971)  is  as  follows;  Let  Wj  =  (Tj ,  Uj),  i=l,2,...n  be  iid 
random  vectors  which  are  not  observable  as  (t,  u)  pairs.  Instead, it  iS  assumed  that  the 
marginal  data  on  t  and  u  are  available  on  these  n  individuals  as  follows. 

File  1 :  x-) ,  X2 . Xp.  which  is  an  unknown  permutation  of  the  unobserved 

values  ti  ,..,tn  . 

File2;ui.uo . Un  • 

Thus  data  in  File  1  is  available  at  one  agency  and  the  data  in  File  2  is  available 
at  the  other  agency.  Clearly  ,  what  is  missing  from  the  conceptually  unobserved  values 
on  (t,u)  IS  the  pairing  which,  identifies  the  tj  and  Uj  that  pertain  to  the  same  individual. 

DeGroot,  Feder  and  Goel  (1971),  call  the  marginal  observed  data  x-|  ,..,Xn  ;  ui . Up,  a 

Broken  Random  Sample  from  the  population  of  (T,U). 

In  this  paper,  we  shall  derive  some  statistical  properties  of  known  stategies  to 
merge  Filel  and  File  2  In  order  to  reconstruct  paired  data  on  (Tj.Uj)  for  the  bivariate 
matching  problem  in  which  both  T  and  U  are  one-dimensional  variables.  We  shall 
begin  with  some  notations. 

1.1  Notations.  Let  (T.U)  have  an  absolutely  continuous  joint  CDF  H(t,u)  and  joint 
density  h(t,u).  The  marginal  dis'ribution  functions  of  T  and  (j  will  be  denoted  by  G(  .  ) 
and  F( .  )  respectively  and  I  [.]  wi'l  denote  the  indicator  function  of  the  event. 


Let  Gn  (x)  =  (1/n  )  Xj  I  [Tj  <  x]  denote  the  empirical  CDF  based  on  the 
variables  Ti 7^.  Similarly,  Fn  (x)  denotes  the  empirical  CDF  based  on  Ui . Un- 

Let  R(i)  =  L  a  I  F  i  ^  ^  a  ]  denote  the  rank  of  Tj  ,i=1 ,2,...,n.  Similarly  S(1 ) .  S(n) 

denote  the  ranks  of  the  variables  Ui  ,U2,  ....Un- 

Let  <p  =  ( (p(l) . (p(n) )  be  a  permutation  of  the  integers  1,2 . n.  The  set  of  all  n! 

permutations  of  1,2,...,n  will  be  denoted  by  ‘F.  Let  <p‘  =  (1,2 . n)  denote  the  identity 

permutation. 

Let  e  >  0  .  For  all  i  =  1 ,2, ...  n,  and  (pe'F  define  events  An;  ( <p ,  e  )  and  Anj  (e  )  as 


follows: 

Ani(<P.e)  =  {  I  U((p(R(j)))-Uj  I  <e}.  (1.1) 

Ani(e)  =  Ani(<p*,e).  (1.2) 

For  all  1  <  j,k  <  n,  let 

^IjK  =  l[Uj-UK>  e]-l  [Tj-TK>0].  (1.3) 

^2jk  ^  l(Tj-Tk>0]-  l[Uj-Uk^-el.  (1.4) 


and  Pi  =cl  P2  denotes  that  the  vectors  Pi  and  p2  have  identical  distributions. 

2.  A  Class  of  Matching  Problems.  Suppose  that  h(t,u)  has  the  monotone 
likelihood  ratio  (MLR)  property.  That  is,  for  all  reals  ti  <  t2  and  ui  <  U2  ,  we  have 

h(ti.  ui)  h(t2.  U2)>  h(ti,U2)  h(t2.ui).  (2.1) 

If  the  broken  random  sample  xi .  Xn.ui .  Un  comes  from  h(t,u),  a  typical 

'matching  strategy'  based  on  permutation  9  g  can  be  described  by  pairing  x^j  )With 
U((p(i)).  Generalizing  the  results  of  DeGroot.Feder  &  Goel  (1971),  Chew  (1973)  showed 
that  if  the  MLR  property  (2.1)  holds,  then  the  strategy  which  maximizes  the  likelihood 
n,  h(xj,U(p(i))  0^  parameter  9  over  is  to  pair  the  ith  smallest  x  with  the  ith  smallest 
u.  Note  that,  though  the  pairings  in  the  unobserved  sample  (Tj,Uj),  i=1,2,  ....  n  are 
unavailable,  the  order-statistics  of  the  marginal  data  on  X  and  U  are  respectively  the 


same  as  the  ordered  values  of  T  and  U  .  Hence,  we  can  write  the  merged  file  on  (T,U) 
due  to  any  strategy  9  as 

(T  (i) ,  U  .  ,2 . n  (2.2) 

Consequently,  the  urged  file  based  on  the  maximum  likelihood  pairing  (MLP) 
mentioned  above,  is  obtained  by  letting  9  =  9*  in  (2.2). 

Quality  of  the  Merged  File.  Ideally,  we  would  like  to  select  a  9  for  whicn  tne  file 
in  (2.2)  recovers  all  (T,U)  pairs  in  the  original  unobserved  data.  It  is  therefore  natural  to 
consider  the  random  variable  N  (  9)  .  the  number  of  correct  matches  due  to  9,  as  an 

indicator  of  the  performance  of  the  matching  (merging)  strategy  9.  The  optin', aiity  of  9* 
subject  to  various  criteria,  e.g.,  maximizing  the  expected  number  of  correct  .matches, 
E  (N(  9)),  is  discussed  in  Ramalingam  (1985). 

Situations  often  arise  where  it  is  not  crucial  that,  after  the  two  files  are  merged, 
the  matched  pairs  be  exactly  the  same  as  the  pairs  of  the  original  data.  For  example, 
when  contingency  tables  analyses  are  contemplated  for  grouped  data  on  continuous 
variables  T  and  U  then,  in  the  absence  of  the  knowledge  of  the  pairings,  we  would  like 
to  reconstruct  the  pairs  but  would  not  worry  too  much  as  long  as  the  u-value  in  any 

matched  pair  came  within  a  pre-fixed  tolerance  e  (a  non-negative  number)  of  the  true 
u-value  that  we  would  get  with  the  ideal  matching  which  recovers  all  the  original  pairs. 
This  type  of  'approximate  matching'  was  first  introduced  by  Yahav  (1982)  who  defined 

e  -correct  matching  as  follows. 

Definition  1  (Yahav)  .  A  pair  (^{\),^{(p(\)))>^  in  the  merged  file  (2.2),  is  said  to  be  c  - 
correct,  if  I  U(<p  (j))  -  U[i]  1  <  e  ,  where  e  >  0  and  U[j]  is  the  concomitant  of  X(i);  that  is  the 
true  u-value  that  was  paired  with  T(j)  in  the  original  sample. 

The  number  of  e  -correct  matches  N  (9,  e),  in  the  merged  file  (2.2)  is  given  bv 

N(  9 ,  e  )  =  Ij  I  [  I  U(<p  (i))  -  U[i]  I  <  e  ]  (2.3) 

Note  that  as  e  i  0,  N(9,e)  converges  (almost  surely)  to  N(9)  ,the  numbe  of  exact 
matches. 

The  counts  N(9)  and  N(9,e)  are  useful  indices  reflecting  the  re'  oility  of  the 
merged  file  (2.2)  resulting  from  9.  We  shall  now  derive  some  statistics  properties  of 
N(9  *,c  )  /n.  In  view  of  the  fact  that  Federal  files  often  consist  of  a  '  rge  number  of 
records,  it  is  clear  that  these  asymptotic  investigations  are  useful. 
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3,  Asymptotic  behavior  of  N(q)*,e).  We  first  establish  a  representation  for  N(9,e) 
as  a  sum  of  exchangeable  0  -1  random  variables.  This  representation  \A/ill  lead  to  an 
easy  proof  of  the  convergence  in  probability  of  the  proportion,  N(9*,e)  /n,  of  e  -correct 
matches  due  to  MLP  strategy.  The  following  Lemma(See  Randles  and  Wolfe  (1979), 
Theorem  1.3.7,  page  16)  will  be  needed. 

Lemma  1 .  If  ^  =d  v  and  K  (.)  is  a  measurable  function  (possibly  vector  valued)  defined 
on  the  common  support  of  these  random  vectors,  then  K{^  )  =ci  K(v  ) 

Proposition  1.  Let  Anj(9  ,£)  and  N(9  ,e)  be  given  by  (1.1)  and  (2.3)  respectively.  Then, 


for  all  9e  ^ 

N(9,e)  =  Si  l[Anj(9.e)]  (3.1) 

where  the  summands,  1  [Ani(9  ,e)]  are  exchangeable  binary  variables. 

Proof.  The  order-statistic  U(jp  (j))  and  the  concomitant  U[i]  of  T(j)  used  in  (2.3)  can  be 
written  in  terms  of  the  ranks  of  T's  and  U's  as  follows: 

U(<P(i))  =  SaUal[R2a=<P(i)]  (3.2) 

U[j]=IaU„l[Ri„=i]  (3.3) 

Note  that  N(9,e  )  is  simply  a  count  of  how  many  pairs  in  the  merged  file  based  on  9,  as 
defined  in  (2.2),  satisfy 

1  U((p(i))  I 

If  (3.4)  holds  for  some  i,  then  3  a  j  such  that 

I  U(,p(j))  -  Uj  I  <E.  (3.5) 

In  view  of  the  continuity  of  (Tj.Uj),  this  correspondence  is  one-to-one.  Therefore,  the 
count  N(9,  e  )  is  same  as  the  count  given  by 

N(9,e)  =  I(,  1  [  1  U(^(R(„)))- U  „  !  <  e  ]  (3.6) 

Hence,  (3.1 )  follows  from  (3.6)  and  the  definition  of  Apj,  in  (1.1). 


In  order  to  show  the  exchangeability  of  the  summands  in  (3.1 ),  note  that  the 
original  samples  are  independent  and  identically  distributed  vectors.  Therefore 


{  Wa  (1) .  W„  (2) . W„  (n)  }=d  {  Wi  .W2 . Wn  } 

where  (  a(l),  a(2), ...  a(n)  )  is  an  arbitrary  permutation  of  (1 ,2 . n). 

Define  a  function  f  =  ( f-i  ,f2 . fn  )  from  by 

1  if  li  I  [bj  -  bj  >  e  ]  <  (p  (li  I  [aj  -  a,  >0] )  <  |  [Pj  -  bi  >  -e  ] 

0  otherwise , 


(3.7) 


^1  = 


(3.8) 


for  j=l  ,2,...,n,  where  (ai  ,bi , ...,  an, bp)  is  an  arbitrary  point  in  rt2n  and  (p  e  4^. It  follows 
from  (3.7)  and  Lemma  1  that 


^  (Wa  (1)  .  Wq,  (2) . W„  (n))  =(j  f  (W-i  ,W2 . Wn  ) . 


(3.9) 


Fix  j  e  {1.2 . n}.  Then,  using  (3.8).  we  see  that  fj  (W„  (i)  .  (2) . W„  (n) )  is  the 

indicator  function  of  the  event 


is  cp(IillT„,|)  -Ti>01)<IillU..,i).Ui>  -c  ) 

or.  equivalently,  in  terms  of  the  ranks  Ri  i ....  Ri  n  of  the  T’s  and  the  empirical  CDF 
Gn  (.)of  U's. 


^n  (Ua  (j)  ■  s  )  ^  (9(Ria(j))'^o  <  Gn  (U  +  e  ). 

Since  Gn'‘'(k /n)  =  U(k)  ,  k=1 .2 . n.  it  follows  that  fj  (Wa(i) .  W„(2)  .....Wo^jn) )  is  1  iff 

I  U  ,p  (R1  „(]))  -  U  „  (j)  I  <  e.  Consequently. 

fj  (1)  •  ^a(2)  '•••'^a(n) )  =  ^  [^na  (j)  (9-c)]-  (3.10a) 

Similarly, 

fj(Wi . Wn)  =  I  [Anj  (cpT)]-  (3.10b) 

The  exchangeability  of  the  summands  in  (3.1)  follows  from  (3.9),  (3.10a)  and  (3.10b). 

We  shall  now  review  some  results  concerning  E  [N  (e  )/n]  ,  due  to  Yahav  (1982), 
where  N(e  )  h  N  (cp  +,e).  Assuming  that  the  distribution  of  T  and  U  satisfies: 

the  conditional  distribution  of  U  given  that  T=t  is  (univariate)  normal  with 
mean  t  and  variance  1  , 


Yahav  (1982)  derived  the  limiting  value  of  pn  (f  )=F  [N  (c  )/n]  as  n  oo  by  using  the 
representation  (2.3)  in  which  the  summands  are  functions  of  the  order-statistics  of 
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Ui . Un  and  the  concomitants  of  the  order-statistics  of  T  i . Tp.  His  proof  relied  on 

an  approximation  theorem,  about  the  order-statistics  for  the  above  model,  given  in 
Bickel  and  Yahav(1977).  Furthermore,  he  also  reported  the  findings  of  a  Monte-Carlo 

study  for  pn(e  )  in  a  particular  case  of  his  model,  namely,  T  and  U  are  bivariate  normal 
with  correlation  p  . 

We  now  establish  the  large-sample  behavior  of  N  (e  )/n  in  case  of  samples  from 
an  arbitrary  population.  The  properties  of  its  expected  value  follow  as  a  consequence. 
In  section  4,  we  indicate  how  Yahav's  simulation  study  of  the  small-sample  properties 
of  pn(c  )  can  be  improved  upon.  We  shall  then  present  the  results  of  our  Monte-Carlo 

study  of  |.in(e  )  when  n  is  small. 

Theorem  1 .  For  broken  random  samples  from  an  absolutely  continuous  distribution, 

N(c  )/n -^pr  p  (e  ),  asn-^oo  (3.11) 

where 

p(r)  =  P[G(U-e)<  F(T)<  G(U+c)]  (3.12) 

Proof:  Let  Lp  =  N  (e  )/n.  Using  the  definitions  of  Api(e  )  in  (1 .2)  and  the  representation 
(3.1 )  for  N  (c  )  as  a  sum  of  exchangeable  binary  variables  we  obtain 

N(e)  =  Iil[Api(e)].  (3.13) 

It  follows  that 

E(Lp)  =  nP(Api(e  ))/n  =  P(Ani(e  )) .  (3.14) 

Note  that 

E  (Lp2)  =  n  -2  [  E(N(e  ))  (2)  +  E  (N(e  ))]  (3.15) 

where  E  (N(e  ))  (2)  is  the  second  factorial  moment  of  N(e  ).  Using  the  representation 
(3.13),  we  get 

E  (Lp2  )  =  n  -2  [n(2)  P{  Api  (e  )  Ap2(e  )  }+n  P(Api  (e  ))]. 

Fora  =1.2....,n,  andj=1,2let 

Vjc  =Si  ^i„i  P'S) 

where  the  sequences  {^i  p^j}  and  {  ^2ai}  defined  in  (1 .3)  and  (1 .4).  It  follows  that 

Api  (e  )  =  (  V  1 1  /n  <  0,  V  21  /n  <  0)  (3. 1 7) 

and 

Anl(E  )Ap2(c  )  =  ninj  (v  y/n  <  0).  (3.18) 


7 


Note  that,  given  W-]  =  (t-|  ,ui )  ,  the  infinite  sequence  ^-|  12.  ^11  a----  'S  exchangeable. 
Hence,  by  the  Strong  Law  of  Large  Numbers  for  exchangeable  random  variables  (see 
Chow  and  Teicher,1978,  p.223), 

V  1  i/n  E(^1 12  I  as  n  ->  00  , 

where  the  conditional  expectation  is  equal  to  {G(ui-  e  )  -  F(ti)}.  It  follows  that 

V11  /n  ->  G(Lli- e  )  -  F(Ti  )  ,  a.s.  asn->oo.  (3.19) 

We  can  show  by  similar  arguments  that 


v-|„ /n  ->G(Ua-e  )  -  F(Tot  )  ,  a.s.  (3.20) 

and 

V2oi /n  ^F(T(5j  )-G(Ua+e)  .a-s.,  (3.21) 


where  a  =1,2.  Using  the  fact  (see  Serfling,  1980,  p.52)  that  a  sequence  of  vectors 
converges  almost  surely  to  a  given  vector  iff  the  componentwise  sequences  converge 
almost  surely  to  the  appropriate  components  of  the  limit,  we  get  from  (3.20)  and  (3.21) 


■  vii/n" 

■  G(Ur  e  )  -  F(Ti  )' 

V21  /n 

— > 

F(Ti)-G(Ui+e) 

vi2/n 

G(U2-e)-F(T2) 

_V22  _ 

F(T2)-G(U2  +  e)_ 

It  follows  from  (3.17),  (3.18),  (3.22)  and  the  independence  of  Wiand  W2  that 


P(Ani(e  ))-^  p  (e) 
and 

P(Anl  (e)  An2(e))^  (e  )• 


(3.23) 

(3.24) 


Therefore  (3.14),  (3.15)  and  (3.23),  (3.24)  imply  that  as  n  00  , 

E(Ln)  P  (e  ).  (3.25) 

and 

Var(Ln)  ^  0.  (3.26) 

It  is  well  known  that  (3.25)  and  (3.26)  imply  the  convergence  m  probability  as  in  (3.1 1 ). 

The  following  corollary  generalizes  Yahav's  result  concerning  np  (r  ),  the  first 
moment  of  N(f  )/n. 
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Corollary\  .  For  p>0,  Lp 

(i)  N(e  )/n  p  (e )  as  n  ->  oo 

(i  i)  E[  (N(e  )/n)P  ]  [}i(e  )]P  as  n  ^ . 

Proof'.  The  number  of  e  -correct  matches  can  atmost  be  n,  the  number  of  pairs  in  the 
unobserved  bivariate-data. Therefore  0  <  N(e  )/n  ^  1,  for  all  n  =2,3,...  and  {N{e  )/n}  is  a 
uniformly  bounded  sequence  of  random  variables.  It  is  well  known  that  convergence  in 
probability  and  Lp-convergence  are  equivalent  for  such  sequences.  Hence  (i)  follows 

easily  from  Theorem  2.  Now,  (ii)  readily  follows  from  (i)  because  (p(e  )P)  is  finite  for 

p>0. 


Note  that  in  our  results,  no  assumption  about  the  conditional  distribution  of  U 
given  T  has  been  made  as  was  the  case  with  Yahav's  results. 


4.  Small  Sample  behavior  of  N(e).  Yahav  used  simulated  samples  from  a 
bivariate  -  normal  population  with  mean  vector  0  and  covariance  matrix 


I  =(1-p2)-1 


(4.1) 


to  study  small  sample  properties  of  |.in(f)-  ^  'S  important  to  note  that  in  (4.1),  the 

variances  of  T  and  U  are  functions  of  their  correlation,  p.  This  is  so,  because  Yahav's 
model  requires  that  the  conditonal  distribution  of  U  given  T=t  be  normal  with  mean  t 
and  variance  1 .  The  limiting  value  of  )in(c)  for  h'S  particular  model  is  given  by: 


P  (r" )  =  .'f>  (x  a(p)  -t-  c  /p  )  -  0(x  a(p)  -  £  /p  )}  dC>(x)  , 


(4.2) 


where  a(p)  =  [  (1  -  p  )/(1  +  p  )]1''2. 


Yahav  computed  p  (e)  by  numercal  integration  for  e  =  0.01,  0.05,  0.1  &  0.3. 
However,  it  can  be  shown  that  (4.2)  simplifies  to 


p  (c)  =  1  -  2  (I>  [-((Up  )/2)-i'2)  f/pl. 


(4.3) 


Yahav  also  provided  Monte-Carlo  estimates  of  Pn(E  )>  for  n  =  10,  20,50  and  100 
using  the  simulated  data  on  T  and  U.  Table  4.1  is  a  typical  example  of  one  of  his 
results. 
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Table  4.1  Expected  Average  Number  of 
e  -correct  Matchings,  e  =  .01 
[Yahav(1982)] 


p 

Pio(e) 

|i20(f) 

P50(C) 

Pioo(£) 

.001 

.5864 

.5326 

.5275 

.5227 

.01 

.1984 

.1648 

.1271 

.1152 

.10 

.1512 

.1058 

.0760 

.0591 

.30 

.1084 

.0686 

.0389 

.0214 

.50 

.1020 

.0582 

.0272 

.0138 

.70 

.0960 

.0614 

.0262 

.0105 

.90 

.0972 

.0540 

.0206 

.0086 

.95 

.0976 

.0496 

.0214 

.0083 

.99 

.0960 

.0484 

.0213 

.0080 

It  is  clear  from  Table  4.1  and  equation  (4.3)  that  |in  (e  )  and  (i(e  )  decrease  as  p 
ranges  from  0.001  to  0.99.  In  fact  ,  (4.3)  implies  that  p  (c)=1-20(-c)  for  p  =1.0  and  p(r)  = 
1.0  for  p=0,  which  goes  against  the  intuition.  One  expects  that  for  an  optimal  strategy, 
such  as  9*.  Pn(c  )  ss  well  as  p(e)  must  be  monotone  increasing  in  p  .The  problem  here 
is  not  with  the  MLP  <p‘,  but  with  the  covariance  matrix  Z,  defined  by  (4.1),  used  in 
Yahav's  model.  Because,  as  p  changes  its  value,  so  do  the  marginal  variances  of  T 
and  U.  In  fact,  as  p  —>  1 ,  the  marginal  variances  To  rectify  this  problem,  we  have 
assumed  a  bivariate  normal  model  for  T  and  U  with  means  zero,  variances  one  and 
the  correlation  p  . 

For  each  combination  of  four  values  of  n,  namely  10,  20,  50  and  100,  and 
twelve  values  of  p  ,  namely  0.00,  0.10  (0.10)  0.90,  0.95,  0,99;  1000  sample  were 
generated  from  the  bivariate  normal  population  using  the  IMSL  Library  routines. 
These  data  were  used  to  obtain  Monte-Carlo  estimates  of  Pr,(c  )■  where  c  was  given 
the  values  0.01 , 0.05,  0.1 , 0.3,  0.5,  0.75,  1.0. 

It  is  easy  to  show  that,  for  the  above  model 

p(t-)  =  P(l  Z  I  <  i:(2(1-p))-'^-),  (4.4) 

where  Z  is  a  standard  normal  random  variable.  It  is  clear  from  (4.4)  that  p  (c  )  is  a 
monotone  increasing  function  of  p  .  Using  standard-normal  CDF  tables,  p  (v  )  in  (4.4) 
was  computed  for  each  combination  of  the  twelve  values  of  p  and  the  seven  values  of 
c  mentioned  above.  The  estimated  values  of  pn  (e  )  and  the  limiting  value  p  (s  )  are 
given  m  Tables  A.1-A,7  in  the  Appendix. 
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Note  that,  as  expected,  Pn(£)  l^{£)  Tables  A.1-A.7  are  monotone 

increasing  functions  of  p  for  each  fixed  e.  Furthermore,  the  quality  of  the  merged  file  is 
quite  good  if  we  want  to  reconstruct  contingency  tables  with  intervals  of  size  .5  o  or 
more  and  the  correlation  p  >  0.5. 


APPENDIX 

Table  A.1  Expected  Average  number  of 
e -correct  Matchings,  e=  0.01 


p 

Mio(f) 

P20(£) 

P50(C) 

Pioo(e) 

P(c) 

0.00 

0.106 

0.054 

0.025 

0.015 

0.008 

0.10 

0.1 13 

0.059 

0.028 

0.017 

0.008 

0.20 

0.127 

0.068 

0.031 

0.018 

0.008 

0.30 

0.138 

0.075 

0.034 

0.020 

0.008 

0.40 

0.155 

0.083 

0.038 

0.023 

0.008 

0.50 

0.174 

0.095 

0.044 

0.026 

0.008 

0.60 

0. 1 99 

0.109 

0.061 

0.036 

0.008 

0.70 

0.231 

0.129 

0.061 

0.036 

0.008 

0.80 

0.279 

0.162 

0.077 

0.046 

0.016 

0.90 

0.374 

0.222 

0.109 

0.067 

0.016 

0.95 

0.476 

0.296 

0.151 

0.094 

0,024 

0.99 

0.700 

0.521 

0.299 

0.191 

0.056 

12 


Table  A.4  Expected  Average  number  of 
e  -correct  Matchings,  e  =  0.3 


I’ 

^^5o(e) 

Micxj(^) 

M(t) 

0.00 

0.255 

0.208 

0.184 

0.175 

0.166 

0.10 

0.265 

0.223 

0.195 

0.186 

0.174 

0.20 

0.284 

0.237 

0.207 

0.197 

0.190 

0.30 

0.305 

0.253 

0.221 

0.211 

0.197 

0.40 

0.334 

0.275 

0.240 

0.229 

0.213 

0.50 

0.363 

0.304 

0.263 

0.250 

0.236 

0.60 

0.401 

0.336 

0.293 

0.278 

0.266 

0.70 

0.455 

0.382 

0.337 

0.320 

0.303 

0.80 

0.532 

0.457 

0.403 

0.386 

0.362 

0.90 

0.670 

0.593 

0.540 

0.519 

0.497 

0.95 

0.802 

0.733 

0.689 

0.674 

0.658 

0.99 

0.978 

0.968 

0.961 

0.961 

0.966 

Table  A.5  Expected  Average  number  of 
e  -correct  Matchings,  e  =  0.5 


3 

^io(e) 

li2o(e) 

li50(e) 

4l00(£) 

a 


Table  A.6  Expected  Average  number  of 
e -correct  Matchings,  e=  0.75 


10(E) 

ll20(e) 

450(e) 

4i(X)(e) 

4(e) 

I.468 

0.433 

0.416 

0.409 

0.404 

•.488 

0.454 

0.437 

0.429 

0.425 

I.514 

0,477 

0.461 

0.453 

0.445 

1.539 

0.505 

0.487 

0.480 

0.471 

1.582 

0.542 

0.522 

0.514 

0.503 

1.621 

0.586 

0.560 

0.555 

0.547 

1.662 

0.633 

0.613 

0.606 

0.59 

1.727 

0.694 

0.679 

0.673 

0.668 

).810 

0.786 

0.772 

0.768 

0.766 

).919 

0.908 

0.906 

0.904 

0.907 

).979 

0.976 

0.978 

0.979 

0.982 

.000 

1.000 

1.000 

1.000 

1 .000 

Table  A. 7  Expected  Average  number  of 
E-correct  Matchings,  e=  1.0 


m 

Q 

m 

M 

Q 

Q 

M 


(e) 

420(e) 

450(e) 

4ioo(e) 

4(e) 

70 

0.545 

0.531 

0.524 

0.522 

93 

0.566 

0.555 

0.549 

0.547 

21 

0.595 

0.581 

0.576 

0.570 

46 

0.622 

0.611 

0.605 

0.605 

90 

0.664 

0.650 

0.644 

0.627 

29 

0.707 

0.691 

0,688 

0.683 

72 

0.753 

0.744 

0.741 

0.737 

30 

0.812 

0.807 

0.805 

0.803 

98 

0.889 

0.887 

0.885 

0.886 

70 

0.970 

0.972 

0.972 

0.975 

96 

0.996 

0.997 

0.997 

0.998 

00 

1.000 

1.000 

1.000 

1.000 

fy 
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